Approximate Nearest Neighbor Search on High Dimensional Data - Experiments, Analyses, and Improvement (v1.0)

نویسندگان

  • Wen Li
  • Ying Zhang
  • Yifang Sun
  • Wei Wang
  • Wenjie Zhang
  • Xuemin Lin
چکیده

Approximate Nearest neighbor search (ANNS) is fundamental and essential operation in applications from many domains , such as databases, machine learning, multimedia, and computer vision. Although many algorithms have been continuously proposed in the literature in the above domains each year, there is no comprehensive evaluation and analysis of their performances. In this paper, we conduct a comprehensive experimental evaluation of many state-of-the-art methods for approximate nearest neighbor search. Our study (1) is cross-disciplinary (i.e., including 16 algorithms in different domains, and from practitioners) and (2) has evaluated a diverse range of settings , including 20 datasets, several evaluation metrics, and different query workloads. The experimental results are carefully reported and analyzed to understand the performance results. Furthermore, we propose a new method that achieves both high query efficiency and high recall empirically on majority of the datasets under a wide range of settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact and Approximate Reverse Nearest Neighbor Search for Multimedia Data

Reverse nearest neighbor queries are useful in identifying objects that are of significant influence or importance. Existing methods either rely on pre-computation of nearest neighbor distances, do not scale well with high dimensionality, or do not produce exact solutions. In this work we motivate and investigate the problem of reverse nearest neighbor search on high dimensional, multimedia dat...

متن کامل

HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search

Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present “HDIdx”, an efficient high-dimensional indexing library for f...

متن کامل

Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration

For many computer vision problems, the most time consuming component consists of nearest neighbor matching in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with onl...

متن کامل

Approximate Nearest Line Search in High Dimensions

We consider the Approximate Nearest Line Search (NLS) problem. Given a set L of N lines in the high dimensional Euclidean space R, the goal is to build a data structure that, given a query point q ∈ R, reports a line ` ∈ L such that its distance to the query is within (1+ ) factor of the distance of the closest line to the query point q. The problem is a natural generalization of the well-studi...

متن کامل

An Investigation of Practical Approximate Nearest Neighbor Algorithms

This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.02455  شماره 

صفحات  -

تاریخ انتشار 2016